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We present an efficient algorithm for calculating spectral properties of large sparse Hamiltonian 
matrices such as densities of states and spectral functions. The combination of Chebyshev recursion 
and maximum entropy achieves high energy resolution without significant roundoff error, machine 
precision or numerical instability limitations. If controlled statistical or systematic errors are accept- 
able, cpu and memory requirements scale linearly in the number of states. The inference of spectral 
properties from moments is much better conditioned for Chebyshev moments than for power mo- 
ments. We adapt concepts from the kernel polynomial method, a linear Chebyshev approximation 
with optimized Gibbs damping, to control the accuracy of Fourier integrals of positive non-analytic 
functions. We compare the performance of kernel polynomial and maximum entropy algorithms for 
an electronic structure example. 
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I. INTRODUCTION 



Many computational physics problems involve very 
large sparse Hamiltonian matrices. If N is the number of 
states, finding all eigenvectors and eigenvalues requires 
cpu time scaling as N 3 and memory scaling as N 2 . For 
individual eigenstates the preferred method is Lanczos 
diagonalization, which uses only matrix-vector-multiply 
operations and requires cpu and memory scaling as N. 
Densities of states and spectral functions for finite dimen- 
sional Hamiltonians are sums of ^-functions with positive 
amplitudes. In the thermodynamic limit of relevance to 
condensed matter physics, these can extrapolate to sin- 
gular structures such as isolated states, band edges and 
Van Hove singularities. New linear scaling methods are 
needed for calculating such spectral properties which in- 
volve many eigenstates, and for quantities derived from 
them including thermodynamics, total energies for elec- 
tronic structure and forces for molecular dynamics. Lim- 
ited energy resolution and statistical accuracy are often 
acceptable provided uncertainties eaa be quantified. 

The maximum entropy methoauu is a popular ap- 
proach: Maximize the information theoretic relative en- 
tropy of the spectrum subject to data constraints. The 
input data are usually power moments. Maximum en- 
tropy spectra are strictly positive. Maximum entropy 
spectra are the solution of a convex non-linear optimiza- 
tion problem. Maximum entropy always yields broad- 
ened representations of the true spectra. The resolution 
function is non-uniform and unknown, with some parts 
of a spectrum converging more rapidly than others as 
the number of moments increases. Occasionally, maxi- 
mum entropy yields spurious structure; for example, it 
can 'ring' in smooth regions of a spectrum near to a Van 
Hove singularity. Non-analytic features are better ap- 
proximated at higher energy resolution, which is achieved 
in maximum entropy by fitting more moments. However, 



as moment order increases, the calculation of power mo- 
ments is more sensitive to machine precision limits, and 
the optimization problem is more ill-conditioned. Max- 
imum entropy is difficult to implement for more than 
about fifty power moments. n n 

The kernel polynomial methodau is much easier to im- 
plement for high energy resolution applications. It is a 
linear Chebyshev approximation to spectra using Cheby- 
shev moment data. Abrupt truncation of Chebyshev 
series results in the Gibbs phenomenon: a lack of uni- 
form convergence at non-analytic (or singular) features in 
spectra. Instead, the moments of the kernel polynomial 
approximation are the data multiplied by Gibbs damp- 
ing factors, which are chosen to ensure positive spectra 
with the highest energy resolution. A kernel polyno- 
mial approximation is a convolution of the true spectrum 
with a known positive kernel polynomial function. It can 
be rapidly evaluated by fast Fourier transform without 
non-linear optimization. In contrast to Lanczos meth- 
ods, Chebyshev recursion is numerically stable without 
accumulation of roundoff error; thus, there is no. need 
for computationally expensive reorthogonalizatioro For 
sparse Hamiltonians, the computational cost for gener- 
ating Chebyshev moment data is linear scaling if cq 
trolled systematic or statistical errors are acceptable!! 
Chebyshev approximations have been applied recently to 
densities of states and spectral functions in diverse ar- 
eas of condensed matter physics including the Heisen- 
berg antiferromagnetu, the Holstcin— £ — J modelta, the 
dielectric constants of quantum dotst3, linear Sjcaiing al- 
gorithms for tight binding molecular dynamicsExEl, non- 
orthogonal electronic structured, and so on. Chebyshev 
approximations have also been developed independently 
for scattering problems in quantum chemistryEa Ea. 

A comparison of the maximum entropy and kernel 
polynomial methods reveals advantages for each. (A 
comparison of Lanczos methods with kernel polynomial 
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methods may be found in Silver, et ala .) The maxi- 
mum entropy method achieves significantly higher en- 
ergy resolution, requiring calculation of four to ten times 
fewer moments for typical applications. However, the 
non-linear optimization problem can be difficult to solve, 
the resolution is non-uniform and unknown, and there is 
a risk of artifacts. The kernel polynomial method has sig- 
nificantly poorer energy resolution. However, non-linear 
optimization is not needed, the resolution function is uni- 
form and known, and there is no risk of artifacts. 

Our experience is that, in most cases, the computa- 
tional cost of generating moment data limits the ability 
to do calculations. The practical necessity to use compu- 
tational resources in the most efficient way motivates our 
development of a new maximum entropy algorithm based 
on Chebyshev moment data. Chebyshev moments have 
several advantages over power moments for a maximum 
entropy algorithm: 

• Machine Precision Limitations - In a power mo- 
ment information in low digits past the decimal 
point is redundant with information in low order 
moments. New information is contained in higher 
digits whose cardinality increases with the order of 
the moment. Thus, machine precision puts a limit 
on how many power moments are useful to calcu- 
late. In contrast, there is no redundancy in mp* 
ments constructed from orthogonal polynomials^ 
and machine precision is not limiting. 

• Conditioning - The ill-posed inverse problem of in- 
fering a spectrum from a limited number of Cheby- 
shev moments is much better conditioned than 
from the same number of power moments. In par- 
ticular, the Hessian for maximum entropy opti- 
mization has a much flatter eigenvalue spectrum 
for Chebyshev moments than for power moments. 

• Computational Efficiency and Accuracy - A sim- 
ple coordinate transformation converts a Cheby- 
shev series to a Fourier series, which enables use 
of fast Fourier transform methods. 

In summary, the combination of Chebyshev recursion and 
maximum entropy should provide an efficient stable al- 
gorithm capable of reaching arbitrarily high energy reso- 

lution ' Ein 

There is an extensive literatureE3~Ej on convex non- 
linear optimization applied to maximum entropy. For our 
applications, we find the principal new algorithmic diffi- 
culty to be control of the numerical accuracy of Fourier 
integrals when the true spectra have singular (or non- 
analytic) features such as 5-functions. We adapt con- 
cepts from the kernel polynomial method and the Shan- 
non sampling theorem to solve this numerical accuracy 
problem. The resulting algorithm has no difficulty han- 
dling thousands of Chebyshev moments, if necessary. 

In Section II we briefly review methods for generat- 
ing Chebyshev moment data. In Section III we describe 



the kernel polynomial method. In Section IV we present 
our maximum entropy algorithm. In Section V we illus- 
trate the method using an electronic structure example, 
comparing the performance of the maximum entropy and 
kernel polynomial methods. In Section VI we conclude. 

II. THE GENERATION OF CHEBYSHEV 
MOMENT DATA 

Consider a density of states as representative of the 
spectral properties of interest. The first step is to scale 
the Hamiltonian, H = aX + b such that all eigenvalues 
x n of X satisfy —I < x n < +1. These endpoints are 
rapidly computed, for example, by Lanczos methods us- 
ing the same matrix-vector-multiply operations required 
for generating Chebyshev moments. The only difference 
between the kernel polynomial method and the maximum 
entropy method is, in order to minimize endpoint correc- 
tions in fast Fourier transform evaluation of Fourier in- 
tegrals, we recommend placing all x n well inside —1 and 
+1, for example —.99 < x n < +.99. This point will be 
discussed further in Section IV. 

The density of states is then 

I N 

Data about D(x) consist of Chebyshev moments, 

p m =Tr{T m (K)} = j T m {x)D{x)dx . (2) 

We use the notation /i m for a datum on a moment, even 
if this estimate is approximate. We calculate moments 
using the Chebyshev recursion relation, 

T m+1 (X) = 2XT m (X)-T m _i(X) , (3) 

which requires the same optimized matrix-vector- 
multiply algorithm used in Lanczos methods. Unlike 
Lanczos recursion, Chebyshev recursions are numerically 
stable to arbitrarily large numbers of recursions. Wc 
use rules for multiplying Chebyshev polynomials, e.g. 
T2m = 1T m T m — I, so that only M/2 matrix- vector- 
multiplies are needed to generate M moments. 

Exact evaluation of M moments requires cpu time pro- 
portional to 0(N 2 M/2) for sparse matrices. Generate 
T m (X)|i > for each basis state \i >. The estimator for 
Chebyshev moments is then 

o N 

tl2m = < i\T m pQT m (X)\i > -1 . (4) 

i=l 

There is a similar expression for m odd. 

Stochastic evaluationcl requires cpu time scaling as 
0(NMN r ). The estimator for Chebyshev moments is 
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(5) 



where the \r > are N r Gaussian random vectors. Such 
data have statistical variance proportional to (NNr)' 1 
which may be expressed directly in terms of the density 
of states. Estimation of statistical errors is described in 
Silver and Roderu. More sophisticated choices of ran- 
dom vector appear to reduce statistical varianceta, but 
they introduce unwanted statistical bias and make error 
estimation difficult. 

Local truncation evaluation of moments requires cpu 
time scaling as O(NMJ). Here, moments are calculated 
with a locally truncated Hamiltonian, H;, where J is the 
number of states included in the truncation range. The 
estimator for Chebyshev moments is 



£ < i|T TO (Xi)|t > 



(6) 



This generates data with a systematic error determined 
by the truncation range. 'Logical' truncations appears 
to converge more raxddly and smoothly than 'physical' 
truncation schemesOcj. Local truncation may be ap- 
plicable if the density matrix has only local off-diagonal 
elements, as in tight-binding Hamiltonians for insulators. 
Exact moment derivatives needed to estimate forces for 
molecular dynamics can be calculated from a Chebyshev 
derivative formula. 

Cpu time and memory limit the number of moments 
M and their statistical and systematic errors. Fortu- 
nately, both stochastic and local truncation methods pro- 
vide means to estimate and control errors. 



III. THE KERNEL POLYNOMIAL 
APPROXIMATION 

The kernel polynomial method has two roles in this 
paper. First, it is a method to estimate spectra from 
Chebyshev moment data. Second, it provides our ap- 
proach to control numerical accuracy in the evaluation 
of Fourier integrals of non-analytic spectra in our maxi- 
mum entropy algorithm. 

An exact Chebyshev moment expansion of the density 
of states is 
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The kernel polynomial method truncates Eq. ((?]) at M 
moments, introduces a factor to damp Gibbs phe- 
nomenon, and substitutes (possibly inaccurate) data ju ro 
for the moments. The kernel polynomial approximation 
to a density of states is then 
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Let cj> = cos 1 (x). 
D{4>) = sinO)D(A'). 
Fourier integrals, 



Then T m (x) = cos(to0). Define 
The Chebyshev moments are then 



T m (x)D(x)dx 



cos(m0)D(0)d0 . (9) 



If the data are exact, Dk(<P) can be represented as both 
a simple convolution and a truncated Fourier series, 
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For later purposes, we emphasize that the Fourier inte- 
grals of Dk{4>) are fi m = %n9m- Thus, Dk{4>) does not 
fit the moment data. If the data are inexact, correspond- 
ing random variables should be added to Eq. (|ic|). 

The kernel &k{4>) is a 27r-periodic polynomial approx- 
imation to a Dirac delta function, analogous to the reso- 
lution function of a spectrometer. Resolution is uniform 
in (j> with width A(j> oc M -1 . If g% = 1, at large \<j>\ the 
kernel is oscillatory with period A(j) = ir/M within an en- 
velope function decreasing slowly as l/4> 2 - The result is 
the Gibbs phenomenon: a lack of uniform pointwise con- 
vergence of the cosine series at singular (or non-analytic) 
structures in the density of states. An optimal Xh&t 
minimizes the Gibbs phenomenon may be derivedB by 
requiring the kernel to be a strictly positive normalized 
polynomial of degree M with minimal variance in <fi. The 
result is 
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where 
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and 
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Here U v are Chebyshev polynomials of the second kind 
and 4>\ = 7r/(M + 2) The g^ decrease smoothly and 
monotonically from 1 to as m increases from to M. 
This kernel-was originally derived by minimizing the uni- 
form normP. Its envelope function decreases exponen- 
tially at large |0|. 

The kernel polynomial method is also applicable to 
spectral functionsO, 
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where O is the appropriate Hermitian operator. The cor- 
responding Chebyshev moments have the form =< 
\f r |O t T m (X)0|\I>o >. Silver et alia compare the per- 
formance of the kernel polynomial method to Lanczos 
methods for spectral functions. 

Applications of kernel polynomial approximations to 
thermodynamics use a rapidly converging Fourier-Bessel 
expansion of the partition functiono, 



-m> 



(15) 



The I m ((3a) are modified Bessel functions. The parti- 
tion function involves integral rather than pointwise con- 
vergence, so the optimal choice is no Gibbs damping, 
q M = 1 

Our maximum entropy algorithm uses the kernel poly- 
nomial approximation in a novel way. We employ fast 
Fourier transforms to evaluate Fourier integrals of D((f>) 
by summing over L + 1 pixels equally spaced in </>, cor- 
responding to sampling D((f>) at the L + 1 zeroes of 
cos((L + 1)0). The Shannon sampling theorem says that 
the only function which can be exactly evaluated by this 
procedure is a band limited function, a finite L'th order 
cosine series. So, in fact, our algorithm exactly evaluates 
Fourier integrals of a Chebyshev approximation. The 
procedures in Section II generate moments of a function 
consisting of a sum of N <5-functions with positive ampli- 
tudes, equivalent to an infinite order Chebyshev series. 
Inasmuch as, typically, N 3> L we should not expect to 
resolve all states. Our maximum entropy algorithm re- 
quires moments of an L-th order positive Chebyshev ap- 
proximation. The only Chebyshev approximation that 
can satisfy the positivity constraint required by maxi- 
mum entropy is a kernel polynomial approximation. Mo- 
ments of the kernel polynomial approximation are related 
to moments of the true function by Gibbs damping fac- 
tors. This subtle difference becomes important because 
we demand high numerical accuracy. We elaborate on 
these points in the next section. 



IV. THE MAXIMUM ENTROPY ALGORITHM 

This section presents our maximum entropy algorithm. 
Although it may be regarded as an. adaptation of previ- 
ous maximum entropy algorithmstj E3, our problem has 
novel issues of numerical accuracy for D(x) that contain 
singular (non-analytic) structures such as ^-functions. 

Consider the case where the data are subject to Gaus- 
sian uncertainties, 
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Here r\ is a random variable and E denotes the statisti- 
cal expectation value of the random variable following it. 
The x 2 statistic for measuring quality of fit is then 
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In case of exact moment data, set a m to the numerical 
precision required, which can be very small. The m = 
term is included to constrain normalization, jEto = 1. 
Taking the limit <7 — * strictly enforces normalization. 
In our applications using 32 bit computers, 6th or 7th 
digits past the decimal point of moments often contain 
important information. We typically drop \ 2 by 12 to 
14 orders of magnitude below its starting values during 
the course of converging to a maximum entropy solution. 
Such high numerical accuracy can be critical to avoid 
spurious artifacts and to yield the correct physics. 

Therefore, very careful attention to numerical accuracy 
is required in evaluating Fourier integrals, Eq. (^|). To 
have an efficient maximum entropy algorithm, we eval- 
uate Fourier integrals by fast Fourier transform. This 
equals a sum over equally spaced points in 4>, 
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The L + l 4>i satisfy cos((L + l)<j>i) = where < I < L. 
The Shannon sampling theorem says this approximation 
becomes exact only if D(<p) is a band limited function 
of degree L. But the exact D{<j>) in our applications are 
sums of 5- functions with positive amplitudes, so evalua- 
tion of Fourier integrals by fast Fourier transform with 
a finite number of pixels L is not exact. The maximum 
entropy D(<f>) also correspond to infinite order Fourier se- 
ries, so evaluation of their Fourier integrals by this pro- 
cedure is not exact either. 

Our strategy to minimize numerical errors is to min- 
imize high frequency components of the maximum en- 
tropy solution. The goal of our algorithm is to find a 
kernel polynomial approximation for M x K moments 
instead of M moments, where K is some integer. Maxi- 
mum entropy provides the criterion for extrapolating the 
moment series. But the moments of the kernel polyno- 
mial approximation of degree M X K are fi m g^ x , so 
these are what we should use as data in our x 2 criterion. 
Modifying the data in this way ensures that our target 
spectrum is positive and satisfies the Hausdorff condi- 
tions for the existence of a maximum entropy solution. 
Choosing K in our algorithm is equivalent to choosing 
the desired energy resolution. If our maximum entropy 
solution was in fact equivalent to a higher resolution ker- 
nel polynomial approximation, choosing the number of 
pixels L = M x K would yield exact Fourier integrals. 
But inasmuch as our maximum entropy solution is not ex- 
actly band limited, we choose the number of pixels some 
integer factor I larger than M x K, i.e. L = M X K X I. 
The extra factor I further reduces numerical errors. 

Increasing L to improve numerical accuracy must be 
balanced against increased computational resources re- 
quired for the fast Fourier transform. Cpu time scales 
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as O(LlnL) and memory scales as 0(L). Typically, we 
find maximum entropy improves resolution by factors of 
4 to 10 over the kernel polynomial method, so most of 
the gain is obtained by choosing 4 < K < 10. The cor- 



responding g. 



MxKi 



s for < m < M are only slightly 



smaller than one, but this difference is enough to deter- 
mine whether our algorithm converges to the stopping 
criterion or stalls at high x 2 ■ Without the Gibbs damp- 
ing correction, convergence may be very non-uniform and 
in some regions approach an energy resolution that can 
not be described with the number of pixels chosen. With 
the Gibbs damping correction, the dynamic range of the 
resolution improvemnt is limited and can be handled with 
the number of pixels chosen. Wc also typically find the 
choice / > 4 to be sufficient to fit Chebyshev moments 
to 7 digits accuracy. 

Endpoint corrections are another concern in evaluating 
Fourier integrals. They are often essential to get reason- 
able convergence for high order moments. Sophisticated 
approaches to this problem have been developed based on 
interpolation schemesEj. However, within our algorithm 
we have the option to minimize endpoint corrections at 
the outset, by forcing the spectrum to be close to zero 
near endpoints = and 4> = n. The easy way to force 
D((/)) toward zero at endpoints is to scale the Hamilto- 
nian, H = aX + b such that all eigenvalues x n of X lie 
between, say —.99 and +.99, rather than —1 and +1. 
This change has only a 1% impact on </> resolution, but it 
avoids the endpoint corrections and the bias of interpola- 
tion schemes. This justifies the recommendations made 
in Section II for scaling the Hamiltonian. We often find 
this superficially small correction to be critical to achiev- 
ing high accuracy fits without stalling. 

Our optimization problem is to maximize the relative 
entropy, 
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subject to data constraints. S is strictly negative and 
equals zero only when D = D a . Here D (4>) is a default 
model for the density of states in the absence of data. 
We obtain faster convergence with less risk of spurious 
artefacts by using prior knowledge to choose a default 
model closer to the final answer. In the absence of prior 
knowledge, we usually use the kernel polynomial approx- 
imation as the default model, inasmuch as the motivation 
for maximum entropy is to improve energy resolution be- 
yond the kernel polynomial method. 

More specifically, the primal optimization problem is 
to maximize 
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as a function of a continuous variable D(<fi). The statis- 
tical regularization parameter a sets a balance between 



the fit, measured by \ 2 , and an information measure, 
—S, of distance between the inferred D(4>) and the de- 
fault model D (<fi). Alternatively, we regard 1/a as a 
Lagrange multiplier enforcing a constraint on \ 2 ■ 

Our algorithm consists of three nested loops: the outer 
loop iteratively decreases a starting from a high value, 
until a stopping criterion is reached; the middle loop for 
each a iteratively solves for the D(<fi) that maximizes 
Q p \ the inner loop at each a and D(<p) solves for the 
next update of D(<f>) using linear equation solvers such 
as conjugate gradients. We discuss each of these loops in 
turn. 

The outer loop typically starts at large a\ as Xo> the 
X 2 of the default model D D . Then progress geometri- 
cally down in a, e.g. otk+i = (Xk/2. The corresponding 
X 2 decreases and the information, —S, increases. If the 
middle loop is unstable, as measured by a significant in- 
crease in x 2 , go back to conditions at the start of this 
loop, halve the step down in a, and iterate until stability 
is reached. Popular stopping criteria for a are x 2 = M 
and x 2 ~ 2a5 f = M, although many other criteria are 
discussed in the literature. Once the stopping criterion 
is passed, perform a golden search for the optimal a. We 
often find that the information —S saturates at an a- 
independent value as a decreases, so that the outer loop 
may be stopped earlier. 

In principal, the middle loop solves the primal opti- 
mization problem, 
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This is a convex optimization problem having a unique 
answer. Unfortunately, this approach is difficult because 
this problem statement is written in terms of a continuous 
positive variable D(<f>) which varies typically by many or- 
ders of magnitude. In practice, the number of variables to 
optimize would be the number of pixels L chosen, which 
for reasons of numerical accuracy we discussed earlier is 
usually a quite large number. However, a dual optimiza- 
tion problemEB solves the same problem, is more stable 
numerically, and is easier to implement than the primal 
problem. It requires only M <^ L parameters A defined 
by 



f_i m ji Ti 



(22) 



Then the maximum entropy D(<p) satisfying Eq. ( pl| ) is 

n i 

cos(m<j>) . (23) 

\ m=0 / 

This form is also obtained by maximizing entropy sub- 
ject to Lagrange contraints on moments with Lagrange 



5 



multipliers A. The dual optimization problem is to max- 
imize 



Q d = \n[ I D(<t>)tht> 
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as a function of the set of A m . Mapping onto the dual 
space of Lagrange multipliers reduces an infinite dimen- 
sional optimization problem to a feasible finite dimen- 
sional problem. Away from the maximum, define 
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Eq. ( |22] ) is satisfied when £ m = 0. 

The middle loop of our algorithm solves Eq. ( p2|) by 
Newton-Raphson iteration. Beginning with some start- 
ing A , the n+ l'st step is 



H n (\ n+l - A") =f 



(26) 



Here 7i is the Hessian of the dual problem, which is a 
positive definite M x M matrix and a simple function of 
the moments, 



Hi; 



Then, 
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We have Qd > Q P in Eq. (|28| ), and as the itera- 
tion proceeds (Q p ) approach Qoo from above (be- 
low). Hence converging bounds at the n'th iteration are 
Q n d > Q°° > Qp where Q°° = Iim^ 00 {Q3, These 
bounds provide stopping criteria for the middle loop. We 
typically stop at Qd = 1.02Q P . 

The inner loop solution of Eq. (|2^) by linear equa- 
tion solvers such as conjugate gradients. This task can 
be handled by standard packages such as E1SPACK. 
An advantage of Chebyshev moments is that the spec- 
trum of eigenvalues of the Hessian in Eq. (^7|) is al- 
most flat, whereas the eigenvalue spectrum for power mo- 
ments is steeply decreasing. Thus, this problem is well- 
conditioned for Chebyshev moments but can easily be- 
come ill-conditioned for power moments as M increases. 

We find the cpu time required by our algorithm scales 
as 0(ML ln(-L)). But it remains negligible compared to 
the cpu time required to generate moment data for most 
problems. As we stated before, use of the maximum en- 
tropy method usually cuts overall cpu requirements by 
at least a factor of 4 over the kernel polynomial method. 
Isolated features in spectra, such as individual states and 



band edges, usually converge much faster, up to a factor 
of 10 or more. 

A few words should be said about data generated by 
stochastic methods in Eq. (||) . Calculation of the covari- 
ance matrix C for such data is described in an earlier 
paperu. Its structure is the same as the Hessian in Eq. 
(p7|). The appropriate generalized \ 2 statistic is 



x a = (M-M) t c- 1 0i-/i) 



(29) 



There is a cancellation, leading to an effective Hessian 
essentially proportional to a unit matrix and independent 
of D. This property further facilitates finding maximum 
entropy solutions for data generated by the stochastic 
method. 

Energy derivatives needed for molecular dynamics can 
be derived for maximum entropy using the same expres- 
sions for exact derivatives of moments. The statistical 
error for stochastic methods using Gaussian random vec- 
tors can easily be accomodated, because the covariance 
of moments is proportional to the Hessian. Details of 
these extensions will be presented elsewhere. 
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FIG. 1. Density of states of an orthogonal tight binding 
model for amorphous diamond calculated with the maximum 
entropy method using 1024 exact Chebyshev moments. 



V. AN EXAMPLE 

We illustrate the performance of our MEM algorithm 
with the example of an orthogonal tight binding calcu- 
lation of the density-, of states of amorphous diamond by 
Dong and Draboldc3. The purpose of their study was to 
examine the localized to extended transition in band-tail 
states in an amorphous semiconductor. We refer readers 
to their paper for a discussion of the physics. Here, we 
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are only interested in a comparison of methods for esti- 
mating densities of states. We note that their maximum 
entropy calculations used Chebyshev moments but were 
limited to about 90 moments because of the difficulties 
in implementing a maximum entropy algorithm. 
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FIG. 2. Comparison of portions of the density of states 
in Fig. 1 calculated by maximum entropy for 128 and 1024 
moments. 
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FIG. 3. Comparisons of portions of the density of states in 
Fig. 1 calculated by the kernel polynomial method (KPM) for 
1024 moments and by the maximum entropy method (MEM) 
for 1024 moments. 



With the algorithmic improvements presented in this 
paper, we demonstrate that maximum entropy calcula- 
tions with thousands of moments are now feasible. 

The Hamiltonian considered has 16384 states. We 
show maximum entropy results for the goal of improv- 
ing resolution by a factor of 8 over the kernel polynomial 
method. Hence, we choose K — 8. We set I = 4 to 
minimize numerical errors in the evaluation of Fourier 
integrals. We use up to M — 1024 exact Chebyshev mo- 
ments. Hence the number of pixels L — M x K x I < 
32768. 

Fig 1 shows the full density of states obtained using 
maximum entropy for M = 1024. Note the optical gap 
in the density of states. Fig. 2 shows a comparison of 
maximum entropy results for M — 128 and M = 1024 
for two portions of this spectrum. On the left is a region 
where the states are dense, and on the right is a region 
inside the gap where the states are sparse. M = 128 is a 
factor of two or three larger than the maximum number 
of moments that can be handled by maximum entropy 
algorithms using power moments. Our algorithm clearly 
is able to handle many more moments and to achieve 
much higher energy resolution. Figure 3 shows the same 
comparison between kernel polynomial method and max- 
imum entropy with M = 1024 moments. Again maxi- 
mum entropy provides a dramatic improvement in energy 
resolution over the kernel polynomial method. 

VI. CONCLUSION 

We have described an efficient algorithm to calculate 
densities of states and spectral functions of large sparse 
Hamiltonians using Chebyshev recursion and maximum 
entropy It is a non-linear extension of the kernel polyno- 
mial method. It is capable of handling large numbers of 
moments and non-analytic (singular) structures in den- 
sities of states and spectral functions to achieve high en- 
ergy resolution. The choice of Chebyshev recursion over- 
comes problems of machine precision and ill-conditioning 
found in maximum entropy algorithms for power mo- 
ments. It also circumvents the accumulation of numerical 
roundoff errors in Lanczos recursion. It controls the nu- 
merical accuracy of Fourier integrals by multiplying the 
moment data by Gibbs damping factors appropriate to 
the number of pixels chosen. It avoids endpoint correc- 
tions to the fast Fourier transforms by appropriate scal- 
ing of the Hamiltonian. Our algorithm achieves signifi- 
cant resolution gains over the kernel polynomial method 
for practical physics examples. The cpu time we need to 
find the maximum entropy solution scales approximately 
as the number of pixels times the number of moments. 
For most applications, this time is small compared to the 
cpu time we need to generate the Chebyshev moment 
data. Overall cpu time can scale linearly in the number 
of states if controlled statistical or systematic errors are 
acceptable. 
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The resulting maximum entropy algorithm has some 
tuning parameters to control the balance between numer- 
ical accuracy, convergence, energy resolution and compu- 
tational resources. A fortran 77 program implementing 
our algorithms for KPM and MEM is available from the 
authors by sending e-mail to rns@loke.lanl.gov. It uses 
publically available libraries including DFFTPACK for 
fast Fourier transforms and EISPACK for solving sys- 
tems of linear equations. 
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